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1 Compiling performance models from parallel programs 

aArjan J. C. van Gemund 
Proceedings of the 8th international conference on Supercomputing July 1994 

A technique is described to automatically compile performance models in the course of program translation. The performance models are 
fully symbolic in order to preserve as much diagnostic information as possible. Although compiled statistically, the models account for the 
effects of resource contention, due to the introduction of a novel algorithm within the symbolic compilation scheme. It is shown that the 
compilation approach fundamentally outperforms traditional static estimation proced ... 



2 Induction of models under uncertainty 80% 

aP Cheeseman 
Proceedings of the ACM SIGART international symposium on Methodologies for intelligent systems December 1986 

This paper outlines a procedure for performing induction under uncertainty. This procedure uses a probabilitic representation and uses 
Bayes' theorem to decide between alternative hypotheses (theories). This procedure is illustrated by a robot with no prior world 
experience performing induction on data it has gathered about the world. The particular inductive problem is the formation class 
descriptions both for the tutored and untutored cases. The resulting class definitions are inherenty p ... 



3 Learning probabilistic prediction functions 77% 
Alfredo DeSantis , George Markowsky , Mark N. Wegman 

Proceedings of the first annual workshop on Computational learning theory December 1988 



4 On (un) predictability of formal languages (Extended Abstract) 77% 

a A. Ehrenfeucht , G. Rozenberg 
Proceedings of seventh annual ACM symposium on Theory of computing May 1975 

Formal language theory deals with a variety of classes of languages. Some of these are abstracting features of languages used for 
communication (as e.g., natural languages, programming languages or languages used in logic), some of them are abstracting features of 
languages used for description of processes (as e.g. basic classes of L languages) and still others are considered for mathematical 
reasons. Can we have a criterion for deciding whether a language can serve as a "communicati ... 



5 LeZi-update: an information-theoretic approach to track mobile users in PCS networks 77% 

aAmiya Bhattacharya , Sajal K. Das 
Proceedings of the fifth annual ACM/IEEE international conference on Mobile computing and networking August 1999 



6 Minimax regret under log loss for general classes of experts 77% 

aNicold Cesa-Bianchi , Gabor Lugosi 
Proceedings of the twelfth annual conference on Computational learning theory July 1999 



7 Dynamic programming alignment accuracy 77% 

a Ian Holmes , Richard Durbin 
Proceedings of the second annual international conference on Computational molecular biology March 1998 
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Universal portfolios with and without transaction costs 
Avrim Blum , Adam Kalai 

Proceedings of the tenth annual conference on Computational learning theory July 1997 

The role of APL in a technical language as illustrated by a modest battle management program 77% 

John C. Mclnturff 

ACM SIGAPL APL Quote Quad , Proceedings of the international conference on APL: APL in transition January 1987 
Volume 17 Issue 4 



10 How to use expert advice 77% 

aNicolo Cesa-Bianchi , Yoav Freund , David Haussler , David P. Helmbold , Robert E. Schapire , Manfred K. Warmuth 
Journal of the ACM (JACM) May 1997 
Volume 44 Issue 3 

We analyze algorithms that predict a binary value by combining the predictions of several prediction strategies, called experts. Our 
analysis is for worst-case situations, i.e., we make no assumptions about the way the sequence of bits to be predicted is generated. We 
measure the performance of the algorithm by the difference between the expected number of mistakes it makes on the bit sequence and 
the expected number of mistakes made by the best expert on this sequence, w ... 



11 Predicting nearly as well as the best pruning of a decision tree 77% 

a David P. Helmbold , Robert E. Schapire 
Proceedings of the eighth annual conference on Computational learning theory July 1995 



12 A game of prediction with expert advice 77% 

aV. G. Vovk 
Proceedings of the eighth annual conference on Computational learning theory July 1995 

13 Arithmetic coding for data compression 77% 

a Ian H. Witten , Radford M. Neal , John G. Cleary 
Communications of the ACM June 1987 
Volume 30 Issue 6 

The state of the art in data compression is arithmetic coding, not the better-known Huffman method. Arithmetic coding gives greater 
compression, is faster for adaptive models, and clearly separates the model from the channel encoding. 

14 Choosing a reliable hypothesis 77% 

a William Evans , Sridhar Rajagopalan , Umesh Vazirani 
Proceedings of the sixth annual conference on Computational learning theory August 1993 

15 How to use expert advice 77% 

aNicolo Cesa-Bianchi , Yoav Freund , David P. Helmbold , David Haussler , Robert E. Schapire , Manfred K. Warmuth 
Proceedings of the twenty-fifth annual ACM symposium on Theory of computing June 1993 
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1 Monte Carlo estimation of Bayesian robustness 

a Jin Wang , Bruce W. Schmeiser 
Proceedings of the 25th conference on Winter simulation December 1993 



2 A Bayesian/information theoretic model of bias learning 84% 

a Jonathan Baxter 
Proceedings of the ninth annual conference on Computational learning theory January 1996 
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1 Computational learning theory: survey and selected bibliography 

a Dana Angluin 
Proceedings of the twenty-fourth annual ACM symposium on Theory of computing July 1992 



84% 



2 Uniform-distribution attribute noise learnability 

a Nader H. Bshouty , Jeffrey C. Jackson , Christino Tamon 
Proceedings of the twelfth annual conference on Computational learning theory July 1999 



83% 



3 Learnability and the Vapnik-Chervonenkis dimension 83% 

aAnselm Blumer , A. Ehrenfeucht , David Haussler , Manfred K. Warmuth 
Journal of the ACM (JACM) October 1989 
Volume 36 Issue 4 

Valiant's learnability model is extended to learning classes of concepts defined by regions in Euclidean space En. The methods in this 
paper lead to a unified treatment of some of Valiant's results, along with previous results on distribution-free convergence of certain 
pattern recognition algorithms. It is shown that the essential condition for distribution-free learnability is finiteness of the Vapnik- 
Chervonenkis dimension, a simple combinatorial par ... 



4 How to use expert advice 82% 

aNicold Cesa-Bianchi , Yoav Freund , David Haussler , David P. Helmbold , Robert E. Schapire , Manfred K. Warmuth 
Journal of the ACM (JACM) May 1997 
Volume 44 Issue 3 

We analyze algorithms that predict a binary value by combining the predictions of several prediction strategies, called experts. Our 
analysis is for worst-case situations, i.e., we make no assumptions about the way the sequence of bits to be predicted is generated. We 
measure the performance of the algorithm by the difference between the expected number of mistakes it makes on the bit sequence and 
the expected number of mistakes made by the best expert on this sequence, w ... 



5 Simulating access to hidden information while learning 82% 

a Peter Auer , Philip M. Long 
Proceedings of the twenty-sixth annual ACM symposium on Theory of computing May 1994 



6 On prediction of individual sequences relative to a set of experts in the presence of noise 80% 

aTsachy Weissman , Neri Merhav 
Proceedings of the twelfth annual conference on Computational learning theory July 1999 



7 On the learnability of discrete distributions 80% 

a Michael Kearns , Yishay Mansour , Dana Ron , Ronitt Rubinfeld , Robert E. Schapire , Linda Sellie 
Proceedings of the twenty-sixth annual ACM symposium on Theory of computing May 1994 
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8 How to use expert advice 80% 

aNicolo Cesa-Bianchi , Yoav Freund , David P. Helmbold , David Haussler , Robert E. Schapire , Manfred K. Warmuth 
Proceedings of the twenty-fifth annual ACM symposium on Theory of computing June 1993 

9 On restricted-focus-of-attention learnability of Boolean functions 80% 

a Andreas Birkendorf , Eli Dichterman , Jeffrey Jackson , Norbert Klasner , Hans Ulrich Simon 
Proceedings of the ninth annual conference on Computational learning theory January 1996 



10 Additive versus exponentiated gradient updates for linear prediction 80% 

ajyrki Kivinen , Manfred K. Warmuth 
Proceedings of the twenty-seventh annual ACM symposium on Theory of computing May 1995 



11 On-line learning of functions of bounded variation under various sampling schemes 80% 

aS. E. Posner , S. R. Kutkarni 
Proceedings of the sixth annual conference on Computational learning theory August 1993 

12 Journal Backlog Report 77% 

a Mark A. Weiss 
ACM SIGACT News June 1995 
Volume 26 Issue 2 

13 System-level power optimization: techniques and tools 77% 

aLuca Benini , Giovanni de Micheli 
ACM Transactions on Design Automation of Electronic Systems (TODAES) April 2000 
Volume 5 Issue 2 

This tutorial surveys design methods for energy-efficient system-level design. We consider electronic sytems consisting of a hardware 
platform and software layers. We consider the three major constituents of hardware that consume energy, namely computation, 
communication, and storage units, and we review methods of reducing their energy consumption. We also study models for analyzing the 
energy cost of software, and methods for energy-efficient software design and compilation. This survery ... 

14 On PAC learning using Winnow, Perceptron, and a Perceptron-like algorithm 77% 

aRocco A. Servedio 
Proceedings of the twelfth annual conference on Computational learning theory July 1999 

15 Learning fixed-dimension linear thresholds from fragmented data 77% 

a Paul W. Goldberg 
Proceedings of the twelfth annual conference on Computational learning theory July 1999 

16 Neural networks and efficient associative memory 77% 

a Matthias Miltrup , Georg Schnitger 
Proceedings of the eleventh annual conference on Computational learning theory July 1998 

17 Latent semantic indexing: a probabilistic analysis 77% 

aChristos H. Papadimitriou , Hisao Tamaki , Prabhakar Raghavan , Santosh Vempala 
Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems May 1998 



18 Learning Markov chains with variable memory length from noisy output 77% 

a Dana Angluin , Miklos Csuros 
Proceedings of the tenth annual conference on Computational learning theory July 1997 



19 Competitive solutions for online financial problems 77% 

a Ran El-Yaniv 
ACM Computing Surveys (CSUR) March 1998 
Volume 30 Issue 1 

This article surveys results concerning online algorihtms for solving problems related to the management of money and other assets. In 
particular, the survey focucus us search, replacement, and portfolio selection problems 

20 Towards robust model selection using estimation and approximation error bounds 770/0 

a Joel Ratsaby , Ronny Meir , Vitaly Maiorov 
Proceedings of the ninth annual conference on Computational learning theory January 1996 
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21 VC dimension of an integrate-and-fire neuron model 

a Anthony M. Zador , Barak A. Pearlmutter 
Proceedings of the ninth annual conference on Computational learning theory January 1996 



77% 



22 Noise-tolerant distribution-free learning of general geometric concepts 

a Nader H. Bshouty , Sally A. Goldman , H. David Mathias , Subhash Suri , Hisao Tamaki 
Proceedings of the twenty-eighth annual ACM symposium on Theory of computing July 1996 



77% 



23 On efficient agnostic learning of linear combinations of basis functions 

a Wee Sun Lee , Peter L. Bartlett , Robert C. Williamson 
Proceedings of the eighth annual conference on Computational learning theory July 1995 



77% 



24 The perceptron algorithm vs. Winnow: linear vs. logarithmic mistake bounds when few input variables are relevant 77% 

ajyrki Kivinen , Manfred K. Warmuth 
Proceedings of the eighth annual conference on Computational learning theory July 1995 



25 Markov decision processes in large state spaces 

a Lawrence K. Saul , Satinder P. Singh 
Proceedings of the eighth annual conference on Computational learning theory July 1995 



77% 



26 From noise-free to noise-tolerant and from on-line to batch learning 

aNorbert Klasner , Hans Ulrich Simon 
Proceedings of the eighth annual conference on Computational learning theory July 1995 



77% 



27 On the optimal capacity of binary neural networks: rigorous combinatorial approaches 

aJeong Han Kim , James R. Roche 
Proceedings of the eighth annual conference on Computational learning theory July 1995 



77% 



28 On genetic algorithms 

a Eric B. Baum , Dan Boneh , Charles Garrett 
Proceedings of the eighth annual conference on Computational learning theory July 1995 



77% 



29 DNF— if you can't learn'em, teach'em: an interactive model of teaching 
Q H. David Mathias 



77% 
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Proceedings of the eighth annual conference on Computational learning theory July 1995 

30 On the learnability and usage of acyclic probabilistic finite automata 770/0 

a Dana Ron , Yoram Singer , Naftali Tishby 
Proceedings of the eighth annual conference on Computational learning theory July 1995 

31 Learning from a consistently ignorant teacher 77 °/° 

a Michael Frazier , Sally Goldman , Nina Mishra , Leonard Pitt 
Proceedings of the seventh annual conference on Computational learning theory July 1994 

One view of computational learning theory is that of a learner acquiring the knowledge of a teacher. We introduce a formal model of 
learning capturing the idea that teachers may have gaps in their knowledge. The goal of the learner is still to acquire the knowledge of the 
teacher, but now the learner must also identify the gaps. This is the notion of learning from a consistently ignorant teacher. We consider 
the impact of knowledge gaps on learning, for example, monotone DNF and d 

32 Learning probabilistic automata with variable memory length 77 % 

a Dana Ron , Yoram Singer , Naftali Tishby 
Proceedings of the seventh annual conference on Computational learning theory July 1994 

We propose and analyze a distribution learning algorithm for variable memory length Markov processes. These processes can be described 
by a subclass of probabilistic finite automata which we name Probabilistic Finite Suffix Automata. The learning algorithm is motivated by 
real applications in man-machine interaction such as hand-writing and speech recognition. Conventionally used fixed memory Markov and 
hidden Markov models have either severe practical or theoretical drawba ... 

33 Bayesian inductive logic programming 770/0 

a Stephen Muggleton 
Proceedings of the seventh annual conference on Computational learning theory July 1994 

Inductive Logic Programming (ILP) involves the construction of first-order definite clause theories from examples and background 
knowledge. Unlike both traditional Machine Learning and Computational Learning Theory, ILP is based on lock-step development of 
Theory, Implementations and Applications. ILP systems have successful applications in the learning of structure-activity rules for drug 
design, semantic grammars rules, finite element mesh design rules and rules for prediction of protein ... 

34 Diversity-based inference of finite automata 770/0 

a Ronald L. Rivest , Robert E. Schapire 
Journal of the ACM (J ACM) May 1994 
Volume 41 Issue 3 

We present new procedures for inferring the structure of a finite-state automaton (FSA) from its input/output behavior, using access to 
the automaton to perform experiments. Our procedures use a new representation for finite automata, based on the notion of equivalence 
between tests. We call the number of such equivalence classes the diversity of the automaton; the diversity may be as small as the 
logarithm of the number of states of the automato ... 

35 Learning binary relations using weighted majority voting 77 % 



a 



Sally A. Goldman , Manfred K. Warmuth 
Proceedings of the sixth annual conference on Computational learning theory August 1993 



36 Worst-case quadratic loss bounds for a generalization of the Widrow-Hoff rule 77 °/° 

aNicolo Cesa Bianchi , Philip M. Long , Manfred K. Warmuth 
Proceedings of the sixth annual conference on Computational learning theory August 1993 

37 Technique for automatically correcting words in text 770/ o 

a Karen Kukich 
ACM Computing Surveys (CSUR) December 1992 
Volume 24 Issue 4 

Research aimed at correcting words in text has focused on three progressively more difficult problems: (1) nonword error detection; (2) 
isolated-word error correction; and (3) context-dependent work correction. In response to the first problem, efficient pattern-matching 
and n-gram analysis techniques have been developed for detecting strings that do not appear in a given word list. In response to the 
second problem, a variety of general and application-specific spelling cor ... 

38 Universal sequential learning and decision from individual data sequences 77 °/° 

aNeri Merhav , Meir Feder 
Proceedings of the fifth annual workshop on Computational learning theory July 1992 

Sequential learning and decision algorithms are investigated, with various application areas, under a family of additive loss functions for 
individual data sequences. Simple universal sequential schemes are known, under certain conditions, to approach optimality uniformly as 
fast as n-llogn, where n is the sample size. For the case of finite-alphabet observations, the class of schemes that can be implemented by 
finite-s ... 

39 Computational strategies for object recognition 77 °/° 
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a Paul Suetens , Pascal Fua , Andrew J. Hanson 
ACM Computing Surveys (CSUR) March 1992 
Volume 24 Issue 1 

This article reviews the available methods for automated identification of objects in digital images. The techniques are classified into 
groups according to the nature of the computational strategy used. Four classes are proposed: (1) the simplest strategies, which work on 
data appropriate for feature vector classification, (2) methods that match models to symbolic data structures for situations involving 
reliable data and complex models, (3) approaches that fit models to the photometry and ... 



40 Supervised adaptive resonance networks 

aR. S. Baxter 
Proceedings of the conference on Analysis of neural network applications May 1991 
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1 A similarity-based probability model for latent semantic indexing 80% 



a 



Chris H. Q. Ding 

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval 

August 1999 



2 Dynamic variable resolution in the quickscreen combat model 770/0 

a John B. Gilmer 
Proceedings of the 15th conference on Winter Simulation - Volume 2 December 1983 

The Quickscreen combat simulation has a scope of Corps level and dynamic resolution from division to battalion level and from 3.5 to 25 
km. This allows a significant performance improvement. Disaggregation occurs when a unit enters an area near enemy units. It is broken 
down into subordinate units at a higher level of resolution. A physical space representation that treats resolution as a dimension supports 
this treatment. As the area of contact shifts due to the course of the battle, the r ... 

3 Distributional clustering of words for text classification 770/0 



a 



L. Douglas Baker , Andrew Kachites McCallum 

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval August 
1998 



4 View volume culling using a probabilistic caching scheme 770/0 

a Mel Slater , Yiorgos Chrysanthou 
Proceedings of the ACM symposium on Virtual reality software and technology September 1997 

5 Towards language independent automated learning of text categorization models 770/0 

aChidanand Apte , Fred Damerau , Sholom M. Weiss 
Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval 

August 1994 
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Searching for PHRASE relevance vector machine. 

Restrict to: Header Title Order by: Citations Hubs Usage Date Try: Amazon B&N Google (Rl) Goo g le (Web) CSB 
DBLP 

3 documents found. Order: citations weighted by year. 

The Relevance Vector Machine - Ti pping (2000) (Correct ) (24 citations) 
The Relevance Vector Machine Michael E. Tipping Microsoft Research 
ftp. research. microsoft.com/users/mtipping/rvm_nips. ps.gz 

One or more of the query terms is very common - only partial results have been returned. Try Google (Rl) . 

S parsity vs. Lar g e Margins for Linear Classifiers - Herbrich. GraepeL Shawe-Taylor (2000) ( Correct) 
the Support Vector Machine [4] or the Relevance Vector Machine [12]ln contrast to previous studies we 
www.learningtheory.org/colt2000/papers/HerbrichGraepelShaweTaylor.ps 

Generalisation Error Bounds for Sparse Linear Classifiers - Graepel, Herbrich. . (2000) (Correct) 

like the Support Vector Machine, the Relevance Vector Machine and K-nearest-neighbour. The bounds are 

www.learningtheory.org/colt2000/papers/GraepelHerbrichShaweTaylor.ps 

Try your query at: Amazon Barnes & Noble Goog le (Rl) Google (Web) CSB DBLP 
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Restrict to: Header Title Order by: Citations Hubs Usage Date Try: Amazon B&N Google (Rl) Goo gle (Web) CSB 
DBLP 

17 documents found. Order: citations weighted by year. 

The Releva n ce Vector Mach i ne - Tipping ( 2000) ( Cor r e c t ) (24 citations) 

a prior over the weights governed by a set of hyperparameters, one associated with each weight, whose most 
number of disadvantages, notably the absence of probabilistic outputs, the requirement to estimate a 
wnK(xxn )w 0 1) where fwn g are the model weights' and K(is a kernel function. The key 
ftp.research.microsoft.com/users/mtipping/rvm_nips.ps.gz 

Sparse Bayesian Learning and the Relevance Vector Machine - Tipping (2001) (C o rrect) ( 2 citations) 

over the model weights governed by a set of hyperparameters, one associated with each weight, whose most 

(SVM)We demonstrate that by exploiting a probabilistic Bayesian learning framework, we can derive 

= w T x)1 ) where the output is a linearly-weighted sum of M generally nonlinear and xed, basis 

www.jmlr.org/papers/volume1/tipping01a/tipping01a.ps.gz 

Bayesian Methods for Mixtures of Experts - Waterhouse. Mackav, Robinson (1996) (Correct) (16 citations) 
the parameters q of the model, where a are the hyperparameters of the prior. Given a set of priors we may 
,wj )is the output of expert j, giving a probabilistic mixture model. In this paper we restrict the 
uncertainty. The use of regularisation or "weight decay" corresponds to the prior assumption that 
svr-www.eng.cam.ac.uk/-ajr/GroupPubs/WaterhouseMacKayRobinson96-nips.ps 

Time Series Prediction Based On The Relevance Vector Machine - With Adaptive Kernels (Correct) 

1 2 j !2 j 2) where j is the hyperparameter that governs the prior defined over the 

Vector Machine (RVM) introduced by Tipping is a probabilistic model similar to the widespread Support 

basis functions and f! j g are the model weights'Unlike in the Support Vector Machines SVM) 

isp.imm.dtu.dk/staff/jqc/./pub/icassp2002.ps.gz 

Informed Pro jec tions - Coh n (2002) (Correct) 

odds with each other, so we must introduce a hyperparameter b representing how much weight to place on 
they include Latent Semantic Analysis (LSA)Probabilistic LSA, Principal Components Analysus (PCA)the 
a hyperparameter b representing how much weight to place on accurately reproducing the original 
www.cs.cmu.edu/-cohn/papers/nips02.pdf 

Bayesian Support Vector Regression Using a Unified Loss function - Wei Chu Chuwei ( Correc t) 

non-Bayesian models are the ability to infer hyperparameters All the correspondences should be 

to compute predictive distribution using the probabilistic framework. Moreover, Bayesian model selection 

considers probability distributions in the weight space of the network. Together with the observed 

www.gatsby.ucl.ac.uk/-chuwei/paper/bisvr.ps.gz 

Bayesian Tri g onometric Support Vector Classifier - Wei Chu Engp (Correct ) 

approximation can be applied to implement hyperparameter inference in section 4 we discuss 

Moreover, Bayesian methods can also provide probabilistic class prediction that is more desirable than 

MacKay's evidence framework (MacKay, 1992) using a weight-space interpretation. The unnormalized evidence 

www.gatsby.ucl.ac.uk/-chuwei/paper/btsvc.ps.gz 

Bayesian Approach To Support Vector Machines - Chu Wei Master (Correct) 
. 38 3.1.1.2 Level 2: Hyperparameter Inference . 
www.gatsby.ucl.ac.uk/-chuwei/paper/thesis.ps.gz 

A New Bayesian Design Method For Support Vector Classification - Wei Chu Sathiya (Correct) 

in the prior distribution, as #the hyperparameter vector. Thus, for a given hyperparameter 

Moreover, Bayesian methods can also provide probabilistic class prediction that is more desirable than 

built up MacKay's evidence framework [4] using a weight-space interpretation. The unnormalized evidence 

guppy.mpe. nus.edu. sg/~chuwei/paper/btsvc_iconip.pdf 
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Bayesian Support Vector Regression Using a Unified Loss Function - Chu. Keerthi. Onq (Correct) 
This not only builds the ability to infer hyperparameters in Bayesian framework but also provides 
over other non-Bayesian models is the explicit probabilistic formulation. This not only builds the ability 
considers probability distributions in the weight space of the network. Together with the observed 
guppy.mpe.nus.edu.sg/-chuwei/paper/ieeebisvr.pdf 

Bavesian Inference in Support Vector Regression - Chu. Keerthi. On g ( Correct) 
18 5. Hyperparameter Inference 
11 3. Probabilistic Framework 

evidence framework (MacKay, 1992) using a weight-space interpretation, Seeger (1999) presented a 
guppy. mpe.nus.edu. sg/~mpessk/papers/bisvr.pdf 

Bayesian Inference i n Tri gonometric Support Vector Classifier - Chu. Keerthi, On g ( Correc t) 
Relevance Determination, Gaussian Processes, Hyperparameter Tuning, and Model Selection iii Table 
3. Probabilistic Framework 

MacKays evidence framework (MacKay, 1992) using a weight-space interpretation. The unnormalized evidence 
guppy. mpe. nus.edu. sg/~mpessk/papers/bitsvc.pdf 

Bayesian Models for Non-Linear Autoregressions - Peter Mulier. Mike West.. (Correct) 

i )2 2V )2) Here V is an additional hyperparameter, and N(x m s) indicates that the random 

Bayesian learning framework, and delivers full probabilistic measures of relevant uncertainties as well as 

Conditional expectations take the form of locally weighted mixtures of linear (auto-regressions. The 

ftp.isds.duke.edu/pub/WorkingPapers/94-30.ps 
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